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Line 8 of right top column, page 3 to line 4 of left top column, page 5 

The point of the present invention is an attempt to enable voice and 
PB signals to be used together as an information input means, by adding to 
the voice recognition unit 13, as already described in FIG. 1, a PB signal, as a 
phoneme and one word, and detecting the PB signal in an exactly same 
format as voice recognition. However, the invention does not assume 
simultaneous coexistence of a voice signal and a PB signal. 

First, we simply consider the case in which frame -by -frame phoneme 
recognition takes place for each and every one of 16 pairs of phoneme 
patterns. In this case, one each of 16 kinds of PB signals, in total, is 
allocated as a pseudo-phoneme norm to each of the 16 pairs, and feature 
patterns necessary for its detection may be stored in the phoneme norm 
pattern memory. A word dictionary associated with PB signals may be 
configured to satisfy the condition that a pseudo-phoneme norm for the same 
PB signal be maintained for a time longer than the time in which reception 
and detection should take place (for instance, 40 milliseconds or longer 
according to the prevailing regulations). 
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Then, we consider the following in the case of hierarchical process in 
which a first column is recognized by first 2 representative clusters : 

1) It should be detected through recognition of a first column that it is a PB 
signal; and 

2) If it has been detected as PB signal in recognition of the first column, then, 
on a second column, recognize which one of them it is. 

In the following, we describe more specifically. 

Now we consider the case in which in phoneme recognition, matching 
according to a likelihood ratio is taken based on LPC (linear prediction) 
analysis. 

Normally, an analysis of p=10 th is conducted for audio signals whose 
band is limited to 0.3 kHz to 3.4 kHz. 

As a result of this analysis, in principle, resonant frequency of p/2 
spectrums, i.e., so-called a formant frequency is specified. In other words, 
when p=10, 5 frequencies can be specified. Then, if these 5 frequencies 
were allocated to both low and high frequency bands, as shown in FIG. 4, 
setting would be possible so that any 6 out of 16 frequencies can be covered 
and 2 pairs can cover any 12 frequencies. 

In FIG. 4, numerals 1, 2, 3, 4, 5 refer to the allocated frequencies by 
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+1 cluster and I, II, III, IV, and V refer to the allocated frequencies by +lb 
cluster. 

As PB signals, it is common that out of 16 signals, 10 digits and 2 
signals (for instance, • mark and + mark) for control are used, meaning it 
would be OK if 12 signals could be detected. In Japan, while 4 lower 
frequencies (697, 770, 852 and 941 Hz) are available; only 3 higher 
frequencies (1209, 1336, and 1497 Hz) are available. 

Parameters for detecting these can be derived from the following 
equation • 



when specified frequencies are set to {fi}=(fi, £2, f3, f4, fs), where T is a 
sampling period, bi is a resonant bandwidth of fi, and since in the case of a 
PB signal, it is provided that allowed fluctuation band of signal frequency 
may be ±2%, about bi = fi x 4% may be taken. 

From this, the tenth degree equation as shown below is generated 

(1) 

and then substituted into the following 
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(2) 



Then, Z idempotency coefficients of the equations (l) and (2) are 
considered al ...a 10, and then [al...al0] can be determined. 

A reverse spectrum coefficient to be used as a phoneme norm pattern 
can be determined, as shown below, as a correlation coefficient that is 
obtained by adding 1 made as ao to this series of a: 



Phoneme parameters stored in the +2 to +15 clusters can be 
determined by LPC analyzing real individual PB signals. 

In addition, in reality, it is not necessary to cover all of 12 PB signals, 
and as one example shown in FIG. 4, since +1 cluster can specify 6 kinds and 
+16 clusters can specify 6 kinds, recognition experiments for the second 
column have only to be carried out for these 6 kinds. 

We now describe one embodiment of the present invention with 
reference to FIG. 2. 

Input voice 20 (it may be a PB signal as pseudo-voice waveform) is 
subjected to computation of a correlation coefficient {rl< a >} LPC (linear 
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prediction) at the voice analyzing unit 21, and residual power E0 X is 
computed. 

Then, for every frame, by a phoneme norm pattern {Al (a) } wherein 
1=0 to 10 and n= 1 to S, a correlation coefficient {rl w }, wherein I = 0 to 10, 
and Eo, and by the following equation, a likelihood ratio is calculated at the 
distance computing unit 23- 

(3) 

Then, matching is taken between an input phoneme series matrix 
with L a x as a scale and a phoneme symbol series word dictionary by DP 
matching, and one having optimum matching is output as result of 
recognition. In that case, as already discussed, in the 16 pairs of phoneme 
clusters, recognition of the first column using only 2 representative ones, e.g., 
+1 (male voice representative) and +16 (female voice representative) takes 
place, narrowing down to N candidate words. Then, when if it is 
determined by the pattern for PB signal detection that has been added to the 
+1 and +16 clusters that a first candidate is a PB signal, recognition in the 
second column takes place with 6 out of 12 kinds of PB signals as N 
candidates. Others are exactly same as the conventional voice recognition. 

In this case, if a cluster for a PB signal were configured as the 17 th 
pair without adding one each of pseudo-phoneme patterns associated with 
individual PB signals to 2 nd pair to 15 th pair, configuration could be possible 
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such that when the candidate is detected as a PB signal in the first column, 
frame-by-frame phoneme recognition has only to take place for this cluster. 

In addition, in the voice recognition of more than one norm pattern at 
generally conducted word level, it is obvious that as for PB signals, only 
recognition of the first column that has been described in the 16 pair clusters, 
each and every one of which is subjected to voice recognition, is sufficient. 

As described above, according to the present invention, voice and PB 
signals can be utilized as an information input means by means of telephone 
and without making any distinction between them, which enables input of 
information that takes advantage of features such as convenience of voice 
input and reliability of PB input. 

For instance, use could be possible wherein only control words that 
are relatively long and can easily utilize context effects are inputted by voice, 
while numeric data that is short and cannot utilize context effects are 
inputted by PB input. 

Alternatively, those who can utilize PB phones can be offered reliable 
PB input, while a system using voice input can service to those who cannot. 
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FIG. 2 

1 Phoneme norm pattern memory 

2 Word dictionary memory 

3 Voice analyzing unit 

4 Distance calculating unit 

5 DP matching unit 

6 Word judging unit 



FIG. 4 

1 High frequency 

2 Low frequency 
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